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© A method and systeni for synchronizing threads 

@ A method and system is described for synchro- 
nizing execution by a processing element of threads 
within a (Process. Befbi'e execution of a thread com- 
mences, a deterrhination is made as to whether all of 
the required resources for execution of the thread 
are available in a cache loc^l to the processing ' 
element. If the resources are not available, then the 
resources are fetched from main storage and stored 
in one or more local caches tiefore execution begins. '- 
If the resources are- available, then execution of the 
thread may begin. During execution of the thread 
and, in particular, an instruction within the thread, the 
instruction may require data in order to successfully 



within a process. ... n. - 

complete its execution. When this occurs, a deter- 
mination is made as to whether the necessary data 
is available. If the data is available, the result of the 
"'instruction execution is stored and execution of the 
thread continues. However, if the data is unavailable, 
then the thread is deferred until the.idata becomes 
available and a new thread is processed. When 
deferring a thread, the thread is placed in the mem- 
ory location which , is to receive the required data. . 
Once -the "data is available; "the thread is removed 
from the data location and placed on a queue for 
execution and the data is stored in the location. 
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TECHNICAL FIELD 

This invention relates in genera! to data syn- 
chronization within a parallel data processing sys- 
tem, and more particularly to the synchronization of 5 
threads within a process being executed by a pro- 
cessing element. 

BACKGROUND ART ^ 

10 

In parallel data processing systems, programs 

to be executed -may be divided into a. number, of 

processes which may be executed in parallel by a 
plurality of processing elements. Each process in- 
cludes one or more threads and each thread in- 75 
eludes a group of sequentially executable instruc- 
tions. The simultaneous execution of a number of 
threads requires synchronization or time-coordina- 
tion of the activities>associated with each thread. 
Without synchronization a processor may sit idle 20 
for a great deal of time waiting for data it requires, 
thereby degrading system performance and utiliza- , 

A thread located in one process is capable of ^^^^ 
communicating with threads , in .another pr^cess^^o^ 
in the same process" and; the^e^;^^nq^ 
of synchronization are required in order to have ah 
efficiently executlng*^syistern" a"^fiigfrdegree of 

system performance. .i.>3.r*o.f pHiri^w 

In order to synchronize the communication of 30 
threads located in different processes, a synchro- 
nization mechanism, such as. l-structures may be 
used, l-structures are used in main storage and are 
described 'in' l-structures: Data Structures for Par- -y- 
allel Computing 3by Arvind.: R.S. Nikhil and K.K. 35 
Pingali. Massachusetts Institute of Technology Lab- ^ 
oratory for Computer Science. February 1987,: e^ r 

Synchronization of threads communicating be- 
tween different -processes does not .negate^, the 
need for-; a synchronization . mechanism, used, to 40 
synchronize - threads within the -same process. % , 
Therefore, a need still exists for an efficient manner 
to synchronize threads within a process thereby 
providing greater system utilization and perfor- 
mance. A need also exists for a synchronization 45 
mechanism of threads within a process wherein the 
synchronization mechanism is local to the process- 
ing element in which the threads are executed. A 
further need exists for a synchronization mecha- 
nism which does not place a constraint on the 50 
number of processes and threads which may be 
executed by the processing element due to the 
size of local memory. 

DISCLOSURE OF INVENTION 55 

The shortcomings of the prior art are overcome 
and additional advantages are provided in accor- 



dance with the principles of the present invention 
through the provislbn-of^a'' method and system for 
synchronizing threads within a process. 

In accordance with the principles of the present 
invention, a method for synchronizing execution by 
a processing element of threads within a process is 
provided. The process includes fetching during ex- 
ecution of a thread within a process a datum field 
from a local frame cache and an associated state 
indicator from a state bit cache. The state indicator 
has a first state value which is used to detennine 
whether the datum field includes a datum available 
for use by the thread. If the datum is^^unavailableV 
then execution of the thread is deferred until the 
datum is available. 

In one embodiment, the thread is represented 
by a continuation descriptor and the step of defer- 
ring the thread includes storing the continuation 
descriptor within the datum field. 

In yet another embodiment, the method of syn- 
chronizing threads includes awakening the deferred 
thread when the datum is available for the thread. 
Awakening .includes removing the continuation de- - .^^ 
. scriptors stored in the datum field and then placing 
^ the datum in the field. _ . v 

( In another aspect of the invehtioh.l'a system^for ^ 
J synchronizing execution by" a processing- element 
of threads within a process is provided. The sys^ - 
^tem includes a local frame cache and a state bit 
"cache, means^for executihgf by the processing ele- 
ment a thread within a process and means for 
fetching from the jocal frame cache a datum field 
and from the state bit cache an associated state 
indicator. The state indicator has a first state value 
and the system includes rheans fo^ determining 
based on the first state value whether the~datum 
field includes a datum .available for use, by the 
thread. Should the datum be unavailable, then 
means for deferring execution of the thread until 
the. datum is available is provided. ,^ . = - 

In. one embodiment, the , system ; further in- 
cludes means for determining a second state for 
the state indicator wherein the second state will 
replace the first state during execution of the 
thread. The first state may indicate a current state 
and the second state may indicate a next state. 

In another aspect of the invention, a method for 
synchronizing execution by a processing element 
of threads within a process is provided. Each 
thread includes a plurality of instructions and the 
method includes executing an instruction. During 
execution of the instruction, at least one source 
operand is fetched from a local frame cache and at 
least one corresponding state indicator having a 
first state is fetched from a state bit cache. Also, 
fetched from the instruction is at least one state 
function associated with at least one fetched 
source operand. The state function is used to se- 
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loct from one ofva plurality >i)fi|ablesJNJ possible ^ 
second states for the state indicator wherein each 
of the second states has an associated flag indica- 
tor. The first state is used to choose from the 
selected N possible states a second state for the 5 
state indicator and the second state replaces the 
first state during thread execution. The flag indica- 
tor specifies one of a plurality of actipns for . the 
thread to perform. ;>j^;3' r , i^r-. ^ for v - 

In accordance with the principles of the present ^ io 
invention, a method and system^forjSynchron^^^^^^ 
threads within a pr;ocess is provided.^The syhchro- 
nization mechanism of the present invention sus- 
pends execution of a thread when data for that 
thread' is unavailable thereby allowing another /s 
thread to be executed. This provides for increased 
system utilization and system performance. 
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BRIEF DESCRIPTION OF DRAWINGS 

The ^subject matter -.which is,jregarded.,as. the 
invention is particularly pointed out and distinctly,^ 
claimed in - the ^claims at the cpnclusion.^of^,,the^ . 
specification. fThetfpregoing and other^objects. fea-., 
tures ..and^.advantages,*of,^theJ^^ -26 
parent from:thej.fptlpwing;fdeta^^^ 
in conjunction oftrithjthe^accompanyln^ 

which: -v,v ih n->nn£ir ti-ni '.oii^Oibrn i\:;ir;vv ^i; Qli 

FIG- 1 '.0 :-..oc.'n r r-j)»D ^^iv.^^i.^f{a c^G'^r^':- en] i 
depicts, one-^ example of . a .biock .diagram^ 30 
parallel -processing system, in accordance with 
the principles of the pi-esent invention; ^ n^., . 

FIG. 2 ■ . -/C:.' t; . ' ^. -. M 0 

is one example of the >logical components, asso-, 
ciated with a main memory control unit of. the 35 
parallel processing system of FIG. 1, in accor- 
dance with the principles of the present inven- 
tion; . . - . 
FIG. 3 *■•.-■ rr. ^ : ■ , v.. ^^-^t . 
is an illustration of one.embodiment of a Jpg^ 4o 
local frame residing in the main memory control 
unit of FIG. 2. in accordance with the principles 
of the present invention; 
FIG. 3a 

depicts one example of a logical work frame 45 
associated with the local frame of FIG. 3, in 
accordance with the principles of the present 
invention; 
FIG. 3b 

illustrates one emtx)diment of the fields con- so 
tained within a non-compressed continuation de- 
scriptor of the present invention; 
FIG. 4 

is an illustration of one embodiment of the en- 
tries of a logical code frame residing in main 55 
memory control units of FIG. 2, in accordance 
with the principles of the present invention; 
FIG- 4a 



depicts one example :.of the fields .within -an 
instruction located in the code frame of FIG. 4. ■ 
in accordance with the principles of the present 
invention; 
FIG. 4b 

depicts one example of the components of the 
destination and source specifiers of the instruc- 
tion of FIG. 4a. in accordance with the principles 
of the present invention; - . ' 
FIG. 5 ^ ' • -^V - 

illustrates one ernbodiment of a ' block diagram 
of 'the^^hafdware ' compdnerits *^of a processing 
element of FIG: 1 . in' accordance with the princi- 
ples of the present invention; 
FIG. 6 

depicts one example of the compionents of a 
ready queue entry within a ready queue de- 
picted in FIG. 5, in accordance with the princi- 
ples of the present invention; 
FIG. 7 

is one example of the components 'of a local 
cohtiriuatiori' "queue ' wftfim *the prdcessing^ele- 
ment of FIG. 5; In 'accoVMnce with the^'piihc^ 
of the present invention;^ c. V Qc .*^> . - 

FIGi 8 ^'^■-'^^.^^^^■v>. B.^. fiowe ^'i^hwM PtAv^^ 

illu^traUs^ohe Sx^ 

components "a&oci^ «^itfv^^a-^icode^frame 
cache residing^ within tK§ prociebsing element "bf ^^ 
tHe'l^res^ntlnv^Mbnr ^'^^ 

depidts^'bne ■ examiDle^^^^ a ' cbde'^'frame^cache ' 
directory associated with the code frame cache 
of Flb?-8, in Wcbrdaniie with the -pirinciples of 
the present invehtion;'^'^'' ^ v v 

FiGS/ip^^lOb ^^-^^^ .f^.DJ- o:^:^^^^r^ 

depict one exam^ie 'bf a block'^diagrarn'of the 
componehts associated with aMocal frame cache - 
located within the processing element depicted 
in FIG. 5. in accordance with the principles of 
the present invention;"^" ' ^ 
Ffd 1V;-''^ cv^:-.r.^.-.v' 
depicts one example of a local frame cache 
directory associated with the local frame cache 
of FIGS, 10a. 10b. in accordance with the princi- 
ples of the present invention; 
FIGS. 12a. 12b 

depict one example of a flow diagram of the 
synchronization process of the present inven- 
tion; and 
FIG. 13 

depicts one example of a flow diagram of the 
processes associated with writing data to a loca- 
tion in a cache, in accordance with the princi- 
ples of the present invention. 
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BEST MODE FOR CARRYING OUT JHE INVEN- 



TION t 



The synchronization mechanism of the present 
invention resides within a parallel data processing 
system 10. as depicted in FIG. 1. In one embodi- 
ment, parallel data processing system 10 includes 
one or more main memory control units 12. a 
plurality of processing elements^ (PE) ,14 and .a 
plurality of inpuVoutput processors (l/OP) 16. Pro- . 
cessing elements 14 communicate with each other, 
main , memory - control ^units 42^^51 Jnfjut/ou^^^ . 
processors 16 through an interconnection . network 
18. One example of the main components asso- 
ciated with main memory control units 12 (or main 
storage) and processing elements 14 are, explained 
in detail below. - : 

Each of main memory control units 12 contains 
a portion of a sequentially addressed linear mem- 
ory address space (not shown). The basic unit of 
information. stored, in the^^ddress space Js a word 
(or memory location) ..having a^^unique. .address 
across all - main , memory. . control .units., (?o.ntiguous^ 
words or memory locations may^,b?,*co"^J>i9.^f 
a logical structure such as a local frame 20 (FIG.,, 

2). a:cpde4ramftl22^gRdsa^d^^ 
embodiment,^lpcal3frame 20.,jarid^work^fr^^^^^^^ 23,^ 

generally..refer to a.grpup^ 9^ ;5^ataiWords^an^^ cpcle^ 
frame 22 refers to a grobp of^nstryptions.^Tl^ere^., 
may be a plurality of local frames, work frames.and,. 
code frames within ;main-memoryj,controLunits.^ 12. 
In one embodiment, a particular Jocal frame is 
associated with a process such Jhat the address of 
a local frame is used as the tdentifier.of a pro^^ 

Referring to FIG. 3. local frame 20 has; in one 
example. 256 local, frame loc^tions':24. The first 
four- locations are reserved for aii.invocation con- 
text map entry 26. which is associated with a 
process to, be executed by one of processing ele- 
ments 14. the next two slots ,are'^,/eserved for 
functions not discussed herein and the remainder 
of the locations (six through 255) are reserved for 
data local to the process. The information con- 
tained within invocation context map entry .26 is 
established prior to instantiation of the process and 
includes, for instance, the following fields: 

(a) A three bit state indicator (ST) 28 which 
indicates the current state of a local frame loca- 
tion. As described further below, state indicator 
28 is relative to a state function, which is used in 
accessing the frame location; 

(b) A twelve bit physical processing element 
(PE) number 30 which identifies which process- 
ing element is to execute the process; 

(c) A three bit process state (PS) 32 which 
indicates the state of the associated process. A 
process may have any one of a number of 
states including, for example: a free state, which 
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25 



30 



is used as a reset state to indicate that the 
process es no Monger^ active;^ an inactive estate. ^. 
used to prevent -"^ pr^ from executing; 'a 
suspended state, used to prevent any modifica- 
tion to a process so that, for example, the op- 
erating system may perform a recovery; an ac- 
tive state in main storage, used to indicate that 
the process can execute'-and that it is main 
memory; or an active state not in main storage, 
used to indicate tiiat the process can execute - . 
and* it is witiiln tfie processing element assigned 
to ^execute the process; ■ ' " ■ -^''^y^' , : 
(d)^A tvib biriocai frame state (FS) 34 which 
indicates the state of local frame 20. Local frame 
20 may have, as an example, one of the follow- 
ing states: 

a present state representing that the local frame 
is present in main storage; 
a transient state representing that the local 
frame is transient between main storage and 
memory local to the processing element, which 
is' identifiecJ by- processing' elernent number 30; 

ah absent' state ^indicating that references - to 
locai ' frame^20^^^re^ tb^be^fedirected fc -the^rpro-: 
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{ej A' bnV' bit-mv^^^^ 

(ICQC) 36 which indicates the manner in which: ; 
tiie process is enqueued onto an invocation con- ^ 
text queue (described below) at instantiation; i ; ^3 

(f ) A one bit cache pinning control '^(CPC) 37 
which indicates whether a'code frame or a local 
frame located within the processing element 
(e^g.:''within a co:de frame cache or ^a local frame 
cache? which is described below) rjs td^ be 
pinned. - - - - 

(g) An eight bit local continuation queue head 
(LCQH) pointer 38 which contains an offset into 
a first entry of work frame 23 (FIG. 2) which is in 
contiguous memory locations to lc>cal frame 20 
(described below); * . v 

(h) An eight bit local continuation queue tail 
(LCQT) 40 which contains an offset into the first 
empty entry at the end of work frame 23; 

(j) A forty bit invocation context queue (ICQ) 
fonr^ard pointer 42 which is used in creating a 
doubly linked list of processes (referred to as an 
invocation context queue) in main storage for 
the processing element identified by processing 
element number 30. The invocation context 
queue has a head and a tail which are retained 
within the processing element, and it is ordered 
based on the enqueue discipline indicated by 
invocation context queue control 36; 
(j) A forty bit invocation context queue (ICQ) 
backward pointer 44 which is also used in creat- 
ing the invocation context queue of processes; 
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00 A forty bit code frame pointer 46 which 
specifies the address of a current cod^ frame 22 

As previously stated, locations six through 255 
of local frame 20 are reserved for data. Each data 
entry 48 includes state indicator 28 and a 64-bit 
datum field 50. Datum field 50 contains the data 
used during execution of the process, rn'. - . 

Referring once again to RG. decoupled to 
each local frame 20 is a logical structure referred, 
to as wori< frame 23. Work frame 23 is allocated m 
the next 256 contiguous locations to local frame 20. 
Local frame 20 and work frame 23 are managed as 
a single entity. The first, for example, sixty-four 
entries or locations 52 of work frame 23 (FIG. 3a) 
include one or more compressed continuation de- 
scriptors (CCD) 54 used, ^as -explained beloyy. jn 
selecting an instruction to be executed or data 
which is to be used by the instruction.. Each com- 
pressed' continuation^ descriptor e 54) i includes, -for 
instance.' a ^' cbde ' offset^ 56 l^and can^eindex .,58 
(described below). In -contrast. a Lcontinuation ,der 
scriptor'-which' -is riot compressed .^also^jncludes^ a.^-. 
^ local -frame^^inter- 60- (FIG> 3b).swhich-.indicateS;. 
the beginning 'Of local ^frame .20>:fA>compr^.ssed 
continuation descriptor does not.heed.to. store.the,,, 
iocarframe p6inter^^since>'it3may>be Jnferred„ jrqm.. 
the main storage ^address^of ^=the-.locaiafrarne/work 
frame pair. In one embodimenticeach location .52 in , 
worii'^frame 23 is capable ^ of a stonngMour.rCom- 
pressed' continuation descriptors. .. h- lo yr: oi 

Referring^once again to FIGr2, the local. frame/ ^_ 
wort< "frame'^'pair^is'^ncroupled''' Uo ■•code^oframe,^;22: 
through code frame pointer 46 of invocation context 
map entry 26 embedded within local : frame^ 20.^^ 
Code frame 22 includes, ^for instance. 256 code 
frame locations 62 (FIG. 4) and each location in- 
cludes a ^word-sized instruction; 64 or^ran mime 
constant (ribt shown).^whichMs. associated. with the 
process to be executed by processing element 14 
as indicated by processing element number 30. 
Subsequent to loading the instructions or constants 
(data for constants are stored at code frame gen- 
eration) into the code frame, the code frame .s 
immutable and thus, may be shared by other pro- 
cesses and processing elements. In one embodi- 
ment, code frame 22 is managed in mam storage 
in sequentially addressed and contiguous group- 
ings of sixteen word blocks. This allows for efficient 
transfer of code frames in main storage to memory 
locations local within the processing element (e.g.. 
code frame caches, which are described below). 

Referring to FIG. 4a. instruction 64 is. for in- 
stance 64 bits in length and includes the following 
fields: 

(a) An 8-bit instruction operation code 66 which 
specifies the operation to be executed by the 
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processing "element, ;Tlie opera code con- 
trols the aritlimetic/logicarunits and instruction J 
sequencing. In addition, it also controls network 
request generation; 

(b) A two-bit thread control (TC) 68 which speci- 
fies the sequencing controls for the current 
thread and its successors (a process includes 
one or more threads of executable instructions) 
within * W ' prpcessing -element in which : the 
threads are being'executed. The sequencing- 
may be, for exarfiple. sequential instruction dis- 
pateh. preventive suspensive submode or end of ^ 
thread.' eacti of "which iare described herein. ' 
Sequential instruction dispatch is the mode of 
execution which entails sequential dispatch of 
the instructions;©; a thread being executed by 
the processing element ' 
Preventive suspensive'' submode causes suspen- 
sion of tlie'current thread at initial dispatch into 
the processing element of an instruction within 
the thread: Should the iristruction execute^ suc- 
cessfuliy.'the''mread irreqSeued in a last in-first 
outiashic^' onto'^^'iocal^ continuation queues of - 
the ' ciirrent Vpcess ^aX '-code ^ offsets plus one. 
(described fiirthe?Wo>S),^ ^suspends s 

a- soiirde ^S^ij'^ef^nc ; 
of W thread^^es^ot^'take^ pifc^ thisotimeil: 
aWd deferring 6f iHh thfiad ^cursnas* described : ; 
tJejbw'^ F%iiowirig''executioh of Hhe^ iristruction.:»a3 
new threa^d' is' 'dispatched into • the processing . 

element. - . . ^ 

End of thread indicates ^to the processing ele- 
ment That" the current thread * ends 'after execu-- 
tion of^thls .instruction!' When termination ^of .the 
thread''is"^d^teSt^^^^tt¥e^^^ 
switches *to the n^xi^^^^^^^ thread to be ex- 
ecuted.' this: thread may be* from the same 
process or ' a higher priority process, which is 
enqueued UFO (Ust in-First out) after initial 
dispatch of the current process into the process- 
ing element. ^ . _ 
(c) A two-bit index increment control (X) 70 
which controls the increment of the value of 
' index' 58 (FIGS;' 3a. 3b)Mh- the current continu- 
ation descriptor. When index increment control 
70 is set to a nonzero value, index 58 is updated 
after execution of the current instruction and 
prior to execution of the succeeding instructions 
in the same thread. 

In one example, index increment control 70 may 
indicate no change in the value of index 58 from 
this instruction to the next or it may indicate the 
incremental value is plus one or minus one; 

(d) A sixteen bit destination specifier 72 is an 
address which indicates, for instance, the target 
of the instruction execution's result; and 

(e) A sixteen bit source operand 0 specifier 74 
and a sixteen bit source operand 1 specifier 76. 
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Sourpe operand specifiers, 7;4. 76 are addresses 

which enable source pperiids !tp be^ 

for the execution functions within the processing' 

element. 

Destination specifier 72 and source operand 
specifiers 74 and 76 each contain the following 
fields, as depicted in FIG. 4b: 

(a) A four bit addressing mode field (AM) 78 
used. to encode the various sources (including, 
for example, signed literal.^^in*dexed"'feigned lit- 
eral. Hocal frame cache, .T(descnbed^ below) or 
indexed local frame cache)., fpr jtie* mstructio^^ 
operand and destination specifiers. Addressing 
mode also encodes whether indexed operand 
addressing is to be used. 

(b) A four bit state function field (SF)^80. In one 
embodiment, instructions ^accessing locations 
within a local frame , cache (described further 
below) include for each source operand specifier 
and the destination specifier ^ a state function 
used, in .indicating the^synchronization function 
being used by that specifieVV IrTacco^^^ with 
the principles of.the present invention, a n^^ 

of J synchronization Junctions" lniayl.be supported 
andnitherefqre... there js ^a^st^ate^f unction asso- 
- ciateid^with each;gf,,the^-^av^^^^^^^ 
functions.* Each^state Junctipn^atlows, for exam - 
ple.otwocinterpretations^^^^^^ ^pr f^.^SiJP9^^^?^ 
and ione for;. a-, write .access^^Exam^^^^^ thV 
synchronizing»jfunctions.3which^^m ^support^ 
ed by the present invention mclucle: Ohe^Time 
Producer. Multiple Consumers (OPMC). which is 
Similar to l-structures^and has a write or^ce prop- 
erty. i.lt refers, toahe^roductlonlpf a data valiie 
which may bemused by ^^a^number of instructions; 
and Multiple Producer .^/Single^'Consum which 
refers tot.the production., of .several data values 
used by one thread. In one embodiment, the 
resulting actions may^pe dependent on the state 
function applied, the ^current!^ stat^^pf the local 
frame location, and the access type^.^reSd or 
write, as described in detail below. 
The state function field is ignored when ad- 
dressing mode 78 selects, for example, a literal 
operand. 

(c) An eight bit frame offset 82 interpreted as an 
offset into one of the accessible local frames 
within a local frame cache (described below) or 
as a literal operand, as selected by the address- 
ing mode of the source/destination specifier. As 
explained more fully below, when frame offset 
82 is used as a frame offset it can be used 
directly or it can be initially added to the value 
of index 58 from the current continuation de- 
scriptor, modulo 256. In one embodiment, it is 
then appended to local frame pointer 60 in- 
dicated by addressing mode 78 and a local 
frame access within the local frame cache is 
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attempted under control of state function 80. 
descrit}ed in detail below. 
Each code frame 22 within main memory con- 
trol units 12 may exist in a state of absent or 
5 present. These states exist and are managed by 
software. An absent state indicates that the code 
frame is not in main storage and therefore, a pro- 
cess requiring the absent code frame is prevented 
from being Instantiated. A present state indicates 
70 that the code frame Is present in main storage and 
therefore, an I npage ' request from a processing 
element may be serviced., pnce the code, frame is . 
in this state, 1t remains in this state until the frame 
is no longer required and it is returned to free 
75 storage under software control. 

Referring once again to FIG. 1. main memory 
control' unitis 12 are coupled to processing ele- 
ments 14 through Interconnection network IS (FIG. 
1). In accordance with the principles of the present 
20 invention, one example of the hardware compo- 
nents associated with each processing element 14 
are depicted in FIG. 5 and include tlie^foHowing: a 
ready 'queue 84, -;a local ^continuation , queue 86. a 
code ^frame icache 88. a local frame - cache 90, a 
25 state bit-^cache 91sahd.7aniexecutipn uhit-92:.^Each 
^^of these^cbrriponents.are described .in.detail^^^^ 

Ready<5queue '84ils. for^example.- a jMny]^,?>^^^ 
ative^memdryj; structured lessentialiy^-as^. aj,.queue._ . 
that ''is':*capable-of:- being (enqueued :at;Jhe head of 
30 the tail depending onethe -enqueue ^discipline of -the 
read y ^queue '-as rispecif ied a by, oi nyocation context 
queue control 36. Ready queue 84 includes a niirri- 
ber of ready queue entries 94 (FIG., 6) correspond- , 
ing to processes or invocations to be ^executed by 
35 processing* 'element ,1 4::nlnL*jpne ^ instance, ready 
queue 84 includes sixteen readyrqueue entries. As 
depicted in FIG. 6 and described herein, each 
ready^pueue entry 94 includes, for example, the 
following fields: - , 

40 (a)'^^ three bit ready :queue (RQ)^ state 95 used 
■"to ^ indicate Jthe current :state ^ofv,a ^reacly; queue, 
Vntry . Each ready queue entry may be in one of 
a number of states, including, for instance, the 
following: empty, indicating that the entry is un- 
45 used and available; ready, indicating that the 
process is ready for execution by the process- 
ing element; pending pretest, indicating that the 
process is awaiting pretesting (described further 
below); prefetch active, indicating that the re- 
50 quired resources for execution of the process 
are being fetched (this will also be described in 
detail below); sleeping, indicating that the ready 
queue entry is valid, but no threads within the 
process are available for dispatching to the pre- 
ss cessing element; and running, indicating a 
thread from the process represented by the 
ready queue entry is being executed by the 
processing element; 
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(b) , A local fraiiae^ pointer.. 96 is used, for in- 
stance/ in accessing local frame cache 90. 
which as descrit>ed in nnore detail below, in- 
cludes the data required for process execution. 
In addition, local frame pointer 96 is used in 
determining whether a local frame exists within 
the local frame cache. Local frame pointer 96 is 
a copy of local frame pointer 60 and is loaded at 
the time that the ready queue entry is filled in; 

(c) A local frame cache physical pointer 97 is an 
address into local frame cache 90: 1,^ 

(d) A three bit local frame cache state 98 is 
used to indicate the current state of a local 
frame in local frame cache 20. A local frame 
within the local frame cache may have a number 
of states including, for example: empty, indicat- 
ing that the frame state for that local frame is 
unknown or not present; transient, indicating the 
local frarne is currently toeing inpaged from main 
memory control units 12 to local frame cache 90 
in processing element 1 4; and, present, indicat- 
ing^* the 'local frame is located ^ in local , frame 

cache 90; ^ v . - -'^ ■ ~- - 
(eVA code Jframe pointer 99 is used in accessing 
code " 
a copy, 

20jy ^ 

(f f A code f rame' c'ache/physical^ pbihter i 00 is 
used to, address a. block of ^instructions in^code 
frame cache 88. as described further b^^ 

(g) A three bit code frame cache state 101 is 
used to determine the current state' of a code 
frame, within ^ code ' frame'^ cache \88.^^ A code 
frame may have a number \of states inc^^^^ 
for . example: empty, indicating that the .franrie 
state for a particular code frame is unkhowri or 
not present; transient, indicating the code frame 
is currently being inpaged from main memory 
control units 1 2 ^tp , code ^frame cache 88 in 
processing element\y;^;. and pres^^^^^^ [ 
the code frame isl^lbcatediih code frame cache 
88. 

(h) A. local continuation queue head pointer 102 
is located in each ready queue entry and is 
used, as described more fully below. In indicat- 
ing the head of the list of threads for the particu- 
lar process to be executed within a processing 
element. During processing, as described below, 
local continuation queue head pointer 102 re- 
ceives its information from local continuation 
queue head pointer 38 located within invocation 
context map entry 26 of local frame 20 which is 
associated with the process to be executed; and 

(i) A^local continuation queue tail pointer 103 is 
located in each ready queue entry and is used, 
as described more fully below, in indicating the 
tail of the list of threads for the particular pro- 
cess. Similar to head pointer 102. local continu- 
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ation queue tail pointer 103 is received from 
local frame 20. In particular, during enqueue into 
the ready queue, local continuation queue tail 
pointer 40 in local frame 20 is copied into ready 
queue entry 94, 
Associated with each ready queue entry 94 is a 
local continuation queue 86 (FIG. 5). Each local 
continuation queue is. for example, a first in-first 
out queue wherein the top entry in the queue is the 
oldest. In general, local continuation queue 86 con- 
tains all of the pending threads or continuations 
associated with'^a process which is on 'the ready 
queue. The local continuation queue head and tail 
pointers located in ready queue entry 94 indicate 
the valid entries in the local continuation queue for 
the particular ready queue entry. Depicted in FIG. 7 
is one example of local continuation queue 86. 

Local continuation queue 86 includes a ntimt^er 
of local continuation queue entries 104. in which 
each entry represents a pending thread for a par- 
ticular process. Each local continuation queue entry 
104 coiitains a compressed continuation descriptor' 



including ' a 'code 'bffset^'lOS''^ index' f06," ' 

which* are deceived from' work frame 23 (i.e;i'c6cle;V'' 
bffsef 56T'1ndex Sisy'of mam* 'memory ^cdn^ 
12.- Code offset; 105^1i5" used' to 'address fan jnstruc- ' 
tion within 'a cdSe'^iframe'^located 'in'- code*^ frame' ' 
cache 88^ and index '106 is used '^during "indexed 
addressing tcf alter the^ value "of the SddfessTjseid to * ' 
locate data within cached local f ram e cache '90."; ' ' * 
Local continuation queue 86 is bbijpied to code ' 
frame cache 88 via code frame cache physical 
pointer 100, as described in detail herein7'Referring 
to FIG. "8. "code frame ciacrie' 88 includes Hn one 
example, 128 code frames 108 and "each code * 
frame includes, e.g.. 256 instructions. In one em- ' 
bodimeht. the code frames located in code frame 
cache 88 are inpaged from main memory control 
units 12 to code frame cache 88 during a prefetch 
stagei described below. Code frame' cache 88 sup- 
ports two simultaneous access ports: a ?ead port ' 
used in fetching instructions and a write port used 
in writing code frames from main storage to the 
code frame cache: ' • ■ * 

In order to locate code frame 108, code frame 
pointer 99 located in ready queue entry 94 of 
ready queue 84 is input into a code frame cache 
directory 110 in order to obtain a value for code 
frame cache physical pointer 100. In one embodi- 
ment, code frame cache directory 110 is organized 
to allow an 8-way set-associative search. 

Referring to FIG. 9. code frame cache directory 
110 includes, for example, sixteen rows and eight 
columns and each column and row intersection 
includes an entry 114. Each entry 114 includes a 
code frame address tag 116 and a state field 118. 
Code frame address tag 116 is. for example, the 
upper thirty-six bits of the 40-bit code frame point- 
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er 99 .and is used in determining the address value 
of code frame cache physical pointer 400. ^State 
field 118 is a three-bit field used in indicating the 
state of a particular code frame 108 within code 
frame cache 88. A code frame within the code 5 
frame cache may have one of the following states: 

(a) An empty state which is defined by an un- 
successful attempt within a processing element 
to locate a particular code frame within the code 
frame cache. -Jhis state .is proper when the code io 
frame exists only in main storage or within an- 
other .processing element.^The^,empty . state is 
recorded ip the code irame cache at system 
initialization and whenever a code frame invali- 
dation occurs. '5 

(b) A transient state which appHes to a code 
frame when it is in a state of motion. For exam- 
ple, the code frame is being moved from main 
storage to the code .frame, cache within the 
processing element (an inpage operation). Dur- 20 
ing .inpaging. .one of .^two .^possible transient 
states may be recorde.d fpA the .frame, depend- ; 
ing on the desired final^state- of the , code frame 

at inpage completion. .The, state is record^ as 
transient-fi'^al stateli^where may^'be^the 25 

pr^se^gi^stite^;^^ 

(described -3 t>ejow)i J.-. ^ or Jor a 

pretest/prefetch f^inpa^e^^^^ 

mapr^.entry ^cache pinning. ;Control TSZ^^as active., . , 
The transient state of a code., frame. in ,the code 30 
frame cache prevents ^selection^^^of. the code 
frame by a cache replacement, algorithm, such 
as for example, a^least recently .used .(LRU) 
algorithm, thereby allowing.eyentyal completion 
of .the inpage operation. . .... . 35 

(c) A present state which indicates that "the con- 
tents of the desired code frame are entirely , . 
within code frame cache 88. VVhen the code 
frame is .in this state, .then processing element 

14 may , letch the instructions Jocated^^jn^ code _4o 
frame caciie 88.-. , , - - r -^. .. . / - 

(d) A pinned state which also' indicates that the 
contents of the desired code frame are entirely 
within the code frame cache. However, if a code 
frame is marked as pinned, then replacement of 45 
the frame during pretest/prefetch is prevented 
(described below). In order to remove a pinned 
code frame from the cache, explicit software 
action is taken. 

Address tag 116 is used in conjunction with so 
code frame pointer 99 to determine an address 
value for code frame cache physical pointer 1 00. In 
particular, the four rightmost bits of code frame 
pointer 99 (FIG. 8) are used to index into one of the 
rows within code frame cache directory 110. Sub- 55 
sequent to obtaining a particular row, the contents 
of each code frame cache address tag 116 within 
the selected row is compared against the value of 



bits 12-47 of code pointer 46. If a match is found, 
then the address value of the code frame cache 
physical pointer is obtained. In particular, the ad- 
dress of pointer 100 is equal to the row identifier 
(i.e.. the four rightmost bits of code frame pointer 
99) and column identifier, which is the binary repre- 
sentation of the column (i.e., columns 0-7) in which 
the match was found. 

Subsequent to determining code frame cache 
physical pointer 100, the physical pointer is used in 
conjunction with code offset 105 located in local 
continuation queue 86 to locate an instruction 120 
within code frame 108. In order to select a particu- 
lar instruction 120 within code frame 108. code 
frame cache physical pointer 100 is appended at 
122 on the left of code offset 105 located in local 
continuation queue entry 104. ' ' 

In one embodiment, instruction 120 includes 
the following fields which are loaded from the copy 
of code frame 22 located within main storage (the 
following fields are similar to the instruction fields 
described with reference to FIG. 4a. and therefore.' ' 
some of the fields^^are'not described in detail at this " 
point): an operation code (OP CODE) 124. a thread " 
contrbl\.rrC)^ 125." ari" index increrhent "control '(X) ' 
1 27. \.ciestinatibhps|Dec^fier/ 12^ operand-', 
zero specifier 'iks'^arTci^a source opferariB'one^specf;^ 
fier 1,30,_pestination specifier 126 in^^^ the^ 
address jn vyhjcK^the^^^^^^ ^f th^ instruction execu- * ' 
tion is^to be^wrjtten and the source operand specifi- 
ers indicate'ihe addresses of the data operands 
located in local' frame cache 90 to be read and 
used during execution of the instruction. " ; 

Code, frame cache 88 is coupled to local frame 
cache .90.^ as^^desciibed in detail herein. Referring 
to FIGS . iOa.^'lOb. loiial frame cache 90 includes.' 
for exarnple.' 256 local frames (131) and each 
frame includes *256 data words' (132) (e.g.. invoca- 
tion context queue information, destination location, 
source opel'and^s;)^ Jn one embodinient, local frame 
cache 90 is orga^iized into eight parallel word-wide * 
banks. Each local frame 131 spans across alt eight 
banks such that each bank stores thirty-two words 
of local frame 131. In one example, the first bank 
(bank 0) holds the following words of local frame 

131: word 0, 8. 16. 32, 40 248 (i.e.. every 

eighth word of the local frame); the second bank 

(bank 1) holds words: 1,9. 17. 33, 41 249 etc. It 

will be apparent to one of ordinary skill in the art 
that this is only one way in which the local frame 
cache may be organized and the invention is not 
limited to such a way. Local frame cache 90 sup- 
ports two simultaneous access ports, a read port 
and a read/write port (not shown). The read port is 
used for fetching operands and the read/write port 
is used for storing results from instruction execu- 
tion and for deferring continuations, as described 
below. 
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ln.one,e.mtxKliment.i,the^ ^ocsie6 in 

local frame cache. 90 are inpaged from main mem- 
ory control units 12 (i.e.. datum 50 is inpaged) to 
local frame cache 90 during a prefetch stage, de- 
scribed l>elow. In order to locate a local frame s 
within the local frame cache (so that inpaged in- 
formation may be written to a location within the 
local frame or so that information may be read 
from a particular, location), local^ frame ,j)omter 96 
located in ready queue entry 94^js input into^a local io 
frame,. cache directory 1J33 inJ order to^^ . 
address value for local frame cache physical point- ^ . 
er 97 located in the ready queue entry. (In another 
embodiment, it is also possible to obtain the local 
frame cache , physical pointer ^during pretesting is 
(described l)elow), thereby, eliminating the process 
for obtaining the pointer address from , the cache 
directory.) In one embodiment, local frame cache 
directory 133 is organized in a similar, manner , to 
code frame cache directory llO.J.e.. it is organized 20 
to allow.an. 8-way set-asspciative^sea^^ ^„ jnf.rni'^.jn'^r 

Referring ^tq^JflG. Jlnlpcal^^frama^cache,^ 
tory iSS. Jncludes, for, example. Jiii;ty-two/ows ,aJid^^^ 
eight columns and each column^ and 
tiqn Jncludejiiian (Onti^^^^^^ 25 
•a "local frame .addre^^^^^ 

Local r1rarneaaddress4a^ fj^e js. ^Jfor ^xannpje,^the^^ 
upper thirtyrfiye bits of the^46-bi^^^^ point- "o^'T 

er 96^and |s usedjnjdetermining*the ,add^^ 
of local. fraine .cache physical pointerj^7^JSta^^^ 
138 is -a three^bit field » used, in indicating the state.^.^^ '. 
of a^particujar local frame ^V3^^ '!'?f^® r.or 

cache 90.,^, local .frame within local frame cache ^^ ^^ 
may have one oXithe4oll9^wing ste^^ ncjHsns^? ^a)^ 

(a) jAn empty-statejwWdi^jis^^d^^^ ,by an un-^^ . ^35 
successful rattempts within a processing , elernent , ^. 
to;^locatera-jparticularq!ocal,Jran^e^^^^^^^ 

frame cache 90. This state is valid^Jor a Jo^cal 
frame,on-the main jtqrage fre^^^^ ^ ^ 

one i which, resides entjr.e!y^,.^^^^ and,,^^^;j*o 
isv.allocated to a process.jjTbe emptyt.|tal^^ 
also t>e detected when a castout from the local 
frame cache to majn storage, is in progress for 
the referenced local frame, resulting in the ac- , 
tual inpage being delayed until castout comple- 45 
tion. The empty state is recorded throughout 
local frame cache at system initialization and 
within local frame 131 in the cache whenever an 
attempted local frame inpage replacing a frame 
in cache is aborted. so 

(b) A transient state which applies to a local 
frame when it is in a state of motion, e.g. mov- 
ing from main storage to the local frame cache 
(a local frame inpage). During the local frame 
inpage. the state of the inpaging frame is re- 55 
corded within local frame cache state 98. During 
inpage. one of the transient states is recorded 

for the frame depending upon the desired final 



state pf local frame 131 at inpage comptetionr::' 
The ,,finaL. state may be present^ for W^/^^ ^ 
pretest/prefetch inpage (explained further t>elow) 
or pinned for a pretest/prefetch inpage with in- 
vocation context map entry pinning control 37 
active. The transient state in the local frame 
cache prevents selection of local frame 131 by a 
local frame cache replacement algorithm (LRU), 
thereby, allowing eventual completion of the in- ' 
page operation. This allows completion of any ^^'^ 
castout associated'with the atxjrted ihpage.^'*^K/"* 

(c) fC'^ree stal©'^hich"lndicates' a^ralid^lc^^ 
frame in the local frame cache which is currently 
not allocated to any process. As one example, a 
local frame enters this state through process 
termination, ^ ' ; 

(d) A present state which indicates that the 
contents of local frame 13i are entirely within ' 
the local frame cache. Whian the local frame is '-^'^ 
within this state, the contents are available for 
access , by an instruction within the processing*" 
element noi.'SGr^ sf.. ^^ '•^i:r:<^m^t^A^ :£:t^^)}^::) 

(e) A pinned, state which also indicates that the 
cohtente otthe desired local Jrame . are ler^rel^*!^ 



within the local frame cache. However, if a^ local 
frame iis,,marked as pinned.: then/eplagemi^nti^^ 
the frame : by pretest/prefetch MSr^preyented y t'^^ 
(described below): In order to remove -a' pmned 
lOCTl frame ,ifem"'tHe cacheV^software^abioh^ 



be taken. . . - 

T^v,^^\ ^^cThO^U'A* . f»^*^.*■^"f."^^^'^;.tIT^^lt:- 

Add^ess^ tag ^136 IS used in conjunction with 

local frame 'pointer 96 to determine the address' 

value, of local ^frame cache physical pointer 97. In 

particuli-i "^the five' bits'' of local frame 

pointer 96 are used to index into one of the rows 

withiriliocai frame feache directory' 133. "Subsequent 

to obtaining a particular row.' the contents 'of each " ' 

local frame address tag 136 within the selected row 



\s compared ^against the value *^6f ' bits 1^ 

'framed address (iDa^e' address o^f'^'the^ ^' 
in main storage). If a match is found.^^ ^ 



logical local , 

local^ frame 11 ^. „ , _^ 

then* the address vaiue of local frame cache phys- 
ical pointer 97 is obtained. In particular, the ad- 
dress oif the^poihter is' equal to the row ibentifier 
(i.e./ the five rightmost bits of local frame pointer 
96) and column identifier, which is the binary repre- 
sentation of the column (i.e., columns 0-7) in which 
the match was found- 
Subsequent to determining local frame cache 
physical pointer , 97, the physical pointer is used 
along with source operand 0 specifier 128 and 
index 106 (i.e.. the index is used if the addressing 
mode indicates that index addressing is to be 
used) to select a datum 132 from local frame 
cache 90 representative of a source operand 0 to 
be used during execution of an instruction. That is, 
a frame offset 140 of source operand 0 specifier 
128 (as previously described, each specifier in- 
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eludes ,an .^addressing mode (AM), state function 
(SF) and frame offset) is added at 142 to index;i06 ; 
and then, local frame cache physical pointer 97 is 
appended on the left of the summation to indicate 
a particular datum (e.g.. source operand jD) 'Within 
the local frame cache. 

Similarly, local frame cache physical pointer 97 
is used with source operand J, specifier 130 and 
index. 106 to jSelect a^^datum 132 from local frame 
cache 90 representative 'of A source operand 1 also 
to be used ^during jristruction execution. Jn particu- 
lar, a frame !off set l44'pf source operand 1 speci- ' 
fier 130 is, added at . 146 to index ,106 and then, 
local frame cache physical pointer' 97 is appended 
on the left of the summation to indicate a particular 
datum (e.g.. source operand 1) ^within the local 
frame .cache.^fl- , . , . 

In addition to the above, local frame cache 
physical, poiriter. 97 Is also used with ^destination 
specifier 126 and index 106 (again, if the index is 
to be used) to select a datum 132 from local frame 
cache' 90 representative of 'the^iocation''within^the 
local ^rame^cache. in, which.^e.g.^ th^^result of the 
instruction^execution is to be stored.' In particular, a 
framed offset 147 of .destination specifier 126 is. 
addecl,at,^9 to^^ 

of the, summation^to indicate a particular datum 
(e.g.. a result.location), within the local frame cacfie. ^ 
Associated with each datum stored in^ local 
- frame cache, 90 ^Js a 3-bit state indicator l'%^Ip- ' 
cated jn state bit cache ^91. 'Similar to Ibcai fjahne' 
cache'90.' state bit cache .91 Jncludes^^^^^ . 
256 locations (152) and eiabh location includes 256 
3-bit state indicators J 48. Intone embodiment, state 
bit cache ^91 is^rganized' jn^ 
banks , accessible in parallel. Each^ location 152 
Spans across alt eight banks^such that ' each bank 
stores thirty-two words of location 152. (The^or-^ 
ganization^of^the state^bit cache is"^s}m^ 
organizatipn^f jlocal^ frame cache^ 
in detail abovelj'^ln accordance "with the present' 
invention, state indicators 148 are in paged from 
main storage to state bit cache 91^^(i-e^. state field 
28 of data entry 48 is copied) in parallel with the 
copying of datum 50 to local frame cache 90. 

The state bits are loaded into or read from the 
state bit cache in a manner similar to that de- 
scribed above for the local frame cache. In particu- 
lar, as shown in FIGS. 10a, 10b. each of the 
addresses obtained (e.g.. by appending the local 
frame cache physical pointer on the left of the 
summation of the particular frame offset and the 
index, if needed) and used to select a datum 132 
(either a source operand or a destination location) 
from local frame cache 90 is also used to select an 
associated state bit indicator 148 from state bit 
cache 91. Each state bit indicator 148 represents 



the current state of a particular datum. A particular 
datum' and ^ Its associated state bit indicator are - 
selected irT parallel from local frame cache 90 and • 
state bit cache 91 , respectively, using the process 
5 described above. However, it will be apparent -to 
one of ordinary skill in the art that the state bit 
cache may be orgainized in a number of ways and 
that only one embodimem is described herein. It 
will also be apparent to one of ordinary skill in the 
10 art that it is also possible 'to eliminate the state bit ^ 
cache ahd Vace the state bit indicators within the ^ 
local frarne 'Mche. e.g..' adjacent to its associated 
datum!"*^^ ieotayr^^ x-.. : . - * . . - -^.5 
State Biit indicators may have a number of 
75 states (as one example, empty, waiting or present) 
and when sin operand and an associated state 
indicator are selected from local frame cache 90 
and state bit cache 91. respectively, the next state 
for each selected operand is determined. In addi- 
20 tion. when a Result is to be written to a location, the 
next state for that location is determined. In one 
embodiment, in order to determine "the next state of »^ - 
an operand or a^restjlT'ioc^^^ plurality of state - 



25 



transition %BTes* and '^a state function '^associated ■ > 




State ^fijnctiorr^Y56 'foV source -6perand'^b*-specifier^'>- • 
128 anS' a sSte'-fiincti ofterand V^^^rji* 

30 specilie?"! 30rEach of 'the 'State fuhctions-^is used ^tcPS ie 
'indicate * the '^sy^chfdWizatioii 'f unction 
abover'associatecl^-^wifh Its specific specifier and ' 
each state function': is used as an address into a - ^- 
state transition tabie?in one embodiment- there is a • - 
'35 state transition table^^fdr ^each specifier.-^ That is. V^sfr. 
there is a state ^tfahsit'ibn 'table 160 associated with 
destination specifier H 26. a state^ transition table - 
1 62 assJk;iatecl Vith'source operand O^^pecifier 1 28 ^ 
and a "^state'^transition table 164 associated with 
40 source operand*"^! specifier 1 30. Located within 
^'each'^of the^¥tate'^traWition' tables is an entry-1 65 " 
whicK includes 'the possible next states '166 'for * 
each of the possible state functions. For example, if 
state function -"1 54 represents a synchronizing tunc- 
45 tion oi ^he-Time Producer, Multiple Consumer, 
then located within state transition table 160 (state 
function 154 indexes into state transition table 160) 
is entry 165 including the possible next states for 
that synchronizing function. In one example, each 
50 entry may include eight possible next states. Fur- 
ther, if state function 154 could represent another 
synchronizing function, such as Multiple Producer, 
Single Consumer, then there would be another 
entry within state transition table 160 containing the 
55 possible next states for that synchronizing function. 
Similarly, state transition tables 162 and 164 in- 
clude entries 165 which contain the possible next 
states 166, Each state transition table is, e.g. lo- 
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cated ^ within , Pfocessihg^e^^ may be 

statically aitered at '/system Initialization in any 
known manner to include additional entries of next 
states which support further synchronizing func- 
tions. 5 

As shown in FIG. 10b. associated with each 
next state 166 is a 3-bit control flag 168. Control 
flag 168 is set at system initialization and is fixed 
for its associated^ next state. Control flag 168 Is 
used in indicating to the processing element which ^ io 
action is to be taken for the„ thread which includes ' 
instruction "120." That istcontroi flag "168 indicates^ ' 
for instance, whether execution of the thread is to * 
be continued or whether execution is to be de- 
ferred (explained below). ^ . 

Referring to FIG. 12. in operation, a process to 
be executed is dispatched by interconnection net- 
work 18 to one of processing elements 14, STEP 
180 "Dispatch Process.'* ^Subsetjuent to receiving 
the dispatched process, a decision is made within 20 
the processing element as 'to whether the process 
is to .be placed ion the "invpcatiori context queue 
located withmlhe' maiif/m^e ''f^ 
is associated witti the' particular processing elennent ' 
or on ready queue 84 located within the processing . 25 
element: STEP W-^ter^n^ 

Incommg^Prpcess ,^;^.,o-^r , • 

In particular.^ jn deciding .v^ere to, place the ' 
process,"arfinitiaf/m^^ f^m^d^^sus^^^^^ 
process is tp be'enqueue^ ^, r,'.^^ 

first ^in-first out ^ma^^^ ' 
Enqueued FIFO?" Should the process %e^^ en- 
queued in a first ln-first but manner, "^^^ a^check 
is made ' to ' see i?Jtie fea^^^ full^ and ^' 

therefore,, cannot accept any more processes; IN- -"--35 
QUIRY 186 "Ready" Queue '-F^^^ If 'the reacly// 
queue is full, the process is/ placed onto the taii;;'^ 
end of the invocation context queue in main "stbr-"'^ 
age until a position is available in the ready queue. " ' ; 
STEP 1 "Enqueue, onto ICQ.". When pja^ a^** 40 
process on ?heVfair^ context' queue*, 

invocation context queue backward pointer 44 lo- 
cated within invocation context map entry 26 of the 
process being added is '^replaced with the current 
value of the invocation context queue tail. In addi- 45 
tion. invocation context queue forward pointer 42 of 
the last process identified by the old tail is updated 
to indicate the new tail of the invocation context 
queue, which is equal to the local frame pointer of 
the process being added. Further, the invocation 50 
context queue tail is set equal to the local frame 
pointer of the process being added, STEP 189 
"Update ICQT and ICQH." 

Returning to INQUIRY 186. if. however, the 
ready queue is not full, then the process is added 55 
to the tail end of ready queue 84. STEP 190 
"Enqueue onto the Ready Queue." In addition to 
loading the process onto the ready queue, one or 



more threads associated with the process. 'are ^ en- 
queued onto local cbntiriuatibn queued. STEP^ 
191 "Place Thread on LCQ." Subsequently, in or- 
der to indicate that there are valid entries in the 
local continuation queue for the -process on the 
ready queue, local continuation queue head 38 and 
tail 40 are copied from invocation context queue 26 
to local continuation queue head 102 and tail 103 
located in ready queue entry 94 designated for that 

process. i/U ■ ,. ' v - 

When a process is placed on the ready queue, 
ready Tqjueue ^state" 95 located within' -ready queue ' 
entry 94 is updated from empty to pending pretest. 
STEP 192 "RQ State is Updated." 

Referring back to INQUIRY 184. should a pro- 
cess be enqueued onto ready queue 84 in a last 
in-first out fashion^ then the process is enqueued 
onto the head of the ready queue with the possibil- 
ity of replacing a valid ready queue entry 94 at the 
tail of the ready queue, STEP 190 "Enqueue onto 
Ready Queue." Once again when the process is 
added to the%ady 'queue."*threads for that process 
are placed *bn local continuation queue 86.<iSTEP 
. 191 "Place Thread on' LCQ" and read^^^^^ 
^'95 is^'iipbated;^^^ ' pending 'pretest,*^'§TEP 1 92 "^RQ ' 
. State 'is ppdatSd"";^ A^lld elnitry Imay^bef 

castout 'tS*ttie'head'fef the -ih^ 

in main^ storage/^ When^adding to the head of^the^ 
invocation xiontext^^ queue 

'^'%rward polnfer *42ior the^ri^ is updated 

to points to 'the^ old head 'of the ^Invocation context 
queue. In addition, invocation context queue back- 
ward pointer 44 of the did head is updated to point 
to the'r'process .being^i added (using Hocal frame : 

^^pointer)/'Further.* the vocation queue' head 

is updated to point to the new jbrocess represented 
by the local frame poihter.^Also,' local continuation, 
queue head 102 and tail 103 are copied from ready 
queue entry 94 to local continuation queue head 38 

^^ and tall 40 in'invocation context map entry-26. - • > 
^ A«s ■^previously** mentioned;*- wh a prbdiess is * 
added to the ready queue, the state of the ready 
queue entry is updated from empty to pending 
pretest. During pending -pretest.^ the availability of 
the resources required for execution of the process 
is determined. INQUIRY 194 "Are Resources Avail- 
able?" In particular, code frame cache 88 is 
checked to see whether code frame 108 as in- 
dicated by code frame pointer 99 in ready queue 
entry 94 is located within the code frame cache. 
Similariy, local frame cache 90 is checked to deter- 
mine if local frame 131 as indicated by local frame 
pointer 96 in ready queue entry 94 is located within 
the local frame cache. Should it be determined that 
code frame 108 or local frame 131 is not present 
within its respective cache and, therefore, is not 
available to the process during processing, the 
missing code frame and/or local frame is inpaged 
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from main storage :,and thereby made available. 
STEP "196 "Prefetch Resources." In - particular. , 
code frame 108 is copied from code frame 22 in 
main storage to code frame cache 88. (inpagmg). 
Further, local frame 131 is copied from datum 50 5 
located within local frame 20 in main memory 
control units 12 to local frame cache 90 and »n 
parallel, state indicator 28 which is associated with 
the datum is inpaged from main memory control 
units 12 (i.e.. local frame 20) to state bit cache 91. w 
The moving of data between main memory control 
units and one or, morOvCaches allows ior the num- 
ber of processes and threads which can be ex- 
ecuted by the processing element to be bound 
only by the size of main storage and not by a finite i5 
amount of local storage. During inpaging. ready 
queue state 95 is updated from pending pretest to 
prefetch active. STEP 198 "Update RQ State.". 

Subsequent to inpaging the resources dunng 
prefetch or if an affirmative response is obtained 20 
from INQUIRY 194. ready .queue state 95 is up- 
dated from prefetch active , to, ready ^indicating that , . 
the processes ready for-execution by the .process- 
ing element. .STEP^20(l;"Update,Rft,Stat^^ 
ready, processmayabe.exec^uted^^^^^ 
^ element^wheiji^heiP!pcess;jSi.4fPtS 
entry iinnreadynqueue,jjEf4.vi>Am^^ 
selected \ for ;.execution. Ahe^Ltop.,m^^^^^ » 
local continuation:queue .86 ,is .sele.cted.vSJEP^2^^^^ 
-Select. Process and,fThread,rrt.NAmenj this-pccurs.,^,^30 
ready queue .state 95 js updated from ready to , 
running. .STEP 204 "Update RQ State." In addition, 
the state of the previous running ready queue entry 
is changed from running 'to ennpty. ready or^sleep-. 
ing (alhof/whichWA described, above) dep^^^^^ 
on the conditions for^which it relinquishes cpntrol of, 
processing within the* processing, element. 

The -selected .thread (or local continuation 
queue entry v1 04) from local continuation queue 86 
includes code , offset-;105 . which .^^^^^ de- ,,,40 

scribed, above, in selecting an.mstruction 120 from 
code frame cache 88 to be executed. STEP 206 
-Fetch Instruction." When • instruction 120 is 
fetched, local continuation queue head pointer 102 
located in ready queue entry 94 is adjusted to 45 
indicate the removal of the processing thread from 
the local continuation queue. STEP 208 "Adjust 

LCQ Head." 

As described above, the instruction which is 
selected includes source operand 0 specifier 128 so 
which is used to select datum 132 representative of 
a source operand 0 from local frame cache 90 and 
its associated state bit 148 from state bit cache 91. 
Also, source operand 1 specifier 130 is used to 
select datum 132 representative of a source 55 
operand 1 from local frame cache 90 and its asso- 
ciated state bit 148 located in state bit cache 91. 
STEP 210 "Select Data and State Bits." 



In addition to the. abpye^ state .^notions 156 ^ 
and 158 located In source operand 0 specifier 128 
and source operand i speafler^lSO. respectively 
are used in selecting a number of possible next 
states 166 from state transition tables 162. '164. In 
particular, state function 156 is used as an address 
into state transition table 162 to select entry 165 
which includes the next states for source operand 0 
specifier. Similarly, state function 158 is used as an. 
address : into state transition" table .l64;,to select 
entry* 165 which includes the next states for source 
• "i----* <^-rcn n^n "Seloctr Poissiblo 



Next rstates." (As^described above, ^eaeh state 
function is representative of a synchronizing func- 
tion and the states associated with each synchro- 
nizing function are included. in the state transition 

tables.)"" " " "r , - '.^ ,^ ■ 

Subsequent to selecting the possible ,next 
states .for a source. operand, .the current, state (state , . , 
indicator' 148) of the operand is used in choosing 
one state from the possible, ne>rt^ states whjch re- , 
presents .thelext state^rVat o#rand, for. exam- 

valu4.pfstati^b.t.,indipatorJ^^^ 



cated 
column 




Intone 



<:tatPfi^ STEP 214 "Determ ne^Next State. ^Jn 
stales ), 'JK^Jn fjj ei^|jTw ofwyioet' .^r!v»^^J^. 

. r''l-„*^J*lfi*^<«i ^■;^ortJ/^ilar wnrhronmna fl 



..tion. Jtlr^^ylg th^m^^ k^ J^ -0 

been * read ,and the possible next states .for a par- 
ticular .syrichronizing function .are empty. ^waiting 
and present. In one exarnple. tf!e,n?)rf^ate to te^ 
..selected for that operand is the present^te. After .^^.^^ 
' the next state is determined, state indicator 148 is ^ 
updated to" the value of, Jh^; next state, pjg.. by^ J 
writing the value of the next state into the current ^ , 
state value located in state^bit cache 91 j^STEP.2l6 -^^^^ 
-Update Current State.".. ^^--f. 
' In^additioa^.to ^the- above^^a,^ 
made as to the course of action to be taken by the 
thread which includes the instruction being ex- 
ecuted. STEP^ 21 8 "Determine Action to be Taken." ; 
Types of actions which may be taken include, for 
instance, continue with thread execution, suspend 
current thread and awaken deferred thread, each of 
which are explained below. 

In one embodiment, in order to determine what 
action is to be taken by the thread, an inquiry is 
made into whether the data (e.g.. source operand 0 
and/or source operand 1) located in local frame 
cache 90 and selected by executing instruction 120 
is available for use by the instruction. INQUIRY 220 
"Is Data Available?" In particular, this determination 
is made by checking state indicator 148 (before it 
is updated to the next state) associated with each 
of source operands 0 and 1. Should state indicator 
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148 indicate,,for instance,. that an ^operand is, in^an 
empty state, then that operand is considered un- 
available. If. however, state indicator 148 indicates 
that each operand is in, for example, a present 
state, then the operands are considered available. 
If the data is available, then execution of the thread 
continues and the result of the executing instruction 
is stored in a result location within local frame 
cache 90. STEP 222 "Continue Execution,- as de- 
scribed in detail herein. , ; , 

In one example, instructions are executed with- 
in execution unit 92. which is coupled to local 
frame cache 90 within processing element 14. Ad- 
dressing mode 78 of each source operand specifier 
located in instruction 120 gates the appropriate 
data, I such , as source operand 0 and source 
operand 1 (which has been obtained as described 
above), into input registers (not shown) of execu- 
tion unit 92. Execution unit 92 executes the instruc- 
tion using -the obtained operands and places the 
result ^n ^audesti nation l(or result),, location Ipcated 
within local, frame^cache 9q;:indica"ted by^ ^lestina- 
tion specifier 126 of instruction 120l4f3bwev^^^ 
the result of the instruction execution is a branch to 
„a specific locatlpp., Jhen^a^new4h^^^ 
a newi^ompressed^cqntinuation .descriptor) ,ma^^^^ 
enqueued onto local continuation queue; .^8^ 
thread^^is fQ^.^the^sa^ne process thaKi^ currently 
being executed) or a new thread .may be h'^ndlecJ 
by interconnection^ network 18 and enqueuecj onto 
a different process* local continuation queue.. ; 

On the other hand, if the answer to INQUIRY 
220 . is in >the negative and -one or ^^rnore of the 
source operands ara^not ayailabie, Je.g.. the^^ state 
indicator- associated with that operand Indicates the 
operand is;not in a present state), then execution of 
the thread associated with the executing instruction 
is . deferred,^. STEP 224 "Defer. Execution of 
Thread.; Xlri;pne example, the paiticulp instruction 
continues^executing. but the results, are ^^^^ 
In particular, if source operand 0 or source operand 
1 is in, for example, a state of empty or waiting and 
therefore, unavailable (if both operands are unavail- 
able, then in one embodiment, operand zero is 
preferred over operand one), then the thread cur- 
rently executing (represented by code offset 105 
and index 106 in local continuation queue entry 
104 within local continuation queue 86) is sus- 
pended until source operand 0 and source operand 
1 (if both are needed) are available. When suspen- so 
sion occurs, any affects of the instruction are nulli- 
fied. 

In order to suspend execution of a thread, code 
offset 105 and index 106 (also referred to as the 
compressed continuation descriptor) located within 55 
the local continuation queue are stored in the da- 
tum location (or field) representative of the unavail- 
able source operand- Each datum 132 may receive 



a nurnber of compressed continuation descriptors 
corresponding to a 'number of threads.' In one ex- 
ample, each datum may store four compressed 
continuation descriptors. 
5 When data is to be written to a datum location 

132 within local frame cache 90, the result location 
and its associated state indicator are specified, as 
described in detail above, by frame offset 147 of 
destination specifier 126 located within code frame, 
10 cache' 88! 'local frarne cache physical pointer 97 
and any corresponding index 106. STEP 226 "Data 
is to be Written" (FIG. ^13) In addition to selecting - 
the location and the state indicator, state function 
154 located in destination specifier 126 is used as 
75 an address into state transition table 160 to select 
a number of possible next states 166 for the result 
location (similar to the procedure described above 
for selecting * the next state for >the -source 
operands). As described atjove. subsequent to se- 
20 looting the possible next states for the result loca- 
tion, the current state* indicator 148 for that location 
is used 'tP chddse* the-next' state^ for the location: 
The current state inclicatdr is" then' updated to re- 
flect the value of the next state;^^^ Vwsi Qr;>|p:-/ 
26 * " fii '^a^ditidn-^^o" the^^ 
' . made'^tcrwiietlier^K^ 

INQUIRY 228*^"l;s^ll6cation EfVip^?" '(When data: is ; 
to be wr-itten. a^T'eadMri^^^ 

location is initially read ^to ^determine ^ if anythingvis ' 
30 ' stored there before^'Sata Ts written to^the jocatiori.) ■ 
Should chosen datum 132 be empty Kas indicated 
by state indicator 148 before it is updated to the 
next state), then the data is written- to that location. 
STEP 230 "Write Data."' On the other hand, if the 
35 ' location is^ not empty?%"aete>ininatioh 1s m^ 

to whether there is one or more compressed con- 
tinuation descriptors stored within the location and. 
therefore, the location is in a waiting state (again, 
as indicated by state indicator 148). STEP '231. "Is 
40 Location in ' Waiting State?" Mf the location (is \not in 
the waiting Utate?^theh'the'^'data" is^^wri 
234 "Write Data." If. however; that location is in a 
waiting state, then any and all compressed continu- 
ation descriptors stored in that location are re- 
moved and enqueued onto the local continuation 
queue associated with the running process before 
the data is written, STEP 232 "Awaken Com- 
pressed Continuation Descriptors." Subsequent to 
removing the compressed continuation descriptors, 
the data is written to the indicated location, STEP 
234 "Write Data," 

In one specific embodiment, each next state 
resident within state transition tables 160. 162. 164 
has an associated 3-bit control flag 168 which is 
retrieved when the possible next states are re- 
trieved. When one of the next states is selected, as 
described above, the associated control flag 168 is 
also selected and is used to indicate what action is 
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to be taken by the processing ,element. That is. the 
control flag indicates, for example, whettier . thread 
execution is to be continued, whether execution of 
the thread is to be deferred or whether, a deferred 
thread is to be awakened- Each of these actions is 
performed in the manner described above. 

Although preferred embodiments have been 
depicted and described in detail herein. Jt will be 
apparent to those skilled in the relevant art that 
various modifications, additions, substitutions .and 
the like can be made without departing from the 
spirit'Of the invention and these are. therefore con-^^, 
sidered to be within the scope of the invention as 
defined in the following claims. 

Claims : . , 

1. Armethod for synchronizing execution by a 
processing element of threads within a pro- 
cess, said, method comprising the steps of: 
executing a thread within a process; , . 

fetching during said thread execution from a ^ 
local frame cache a datum field; r.^.,-. tnr^.>.. - 
fetching from a state bit cache a state Jndica-,. ^ 
torr saidistate, indicator i)eing assqpiated^with 
said:datum;field and havingia firststate- vajue; 
determiningvbased^fon .said ^first. state value ^ 
whether- saidiidatumKfie>dv%incJudes ,a datum 
available Jor use by said thread; and. M<^v --ir- 
defen-ing execution of said, thread when said ^ 
tdatum is unavailable. ^ f^-.D . ' - 

2. The method of claim 1, further including the .^ , 
step of determining a second state for said ^ 
rstate indicator. said,..second state, replacing 

said first state during said thread execution. 
.\yr.t ■ > : ^cw . - 

3. The method of claim 2, wherejn said second 
state determining step includes the steps of: , _ 

; selecting, from said thread^^ state function to . 

• be usecl.^n deternriming sai 
using said state function to select from one of 
a plurality of tables N possible second states 
for said indicator; and ; ^ 
using said first state to choose from said se- 
lected N possible states said second state for 
said indicator. 

4. The method of claim 3. wherein said first state 
represents a current state of said datum and 
said second state represents a next state of 
said datum. 

5. The method of one of claims 1 to 4, wherein 
said thread is represented by a continuation 
descriptor and said deferring step includes the 
step of storing said continuation descriptor 
within said datum field. 



6. The method of claim 5. wherein said continu- 
ation descriptor is compressed before ^beihg w 
stored iri said datum field and said datum field 
can receive a plurality of said compressed 
continuation descriptors. 

7. The method of claim 6. further including the 
step of awakening said deferred thread when 
datum is available for said thread. 
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8. The method of claim 7. wherein'^said awaken- • 
ing step ''includes the step of; rembvin 
cornpresseS'tiontinuatio'n descriptors from' said ^ 
datum field. " 

9. The method of claim 8. wherein said removed 
continuation descriptors are stored on a queue 
local to said processing element. . V • 

10. The method of claim" 8 or 9. wherein- said 
awakening step further includes the step of 
'storing said available daturin iri said datum field -^^ 
when^ sai'd compressed continuatioh '^descrip-' n-: 
tors are removed. ' ' f<.- :.v 

1 1 ' a: sysferr^<^ 
' f^<i{fg^i?^^§lSmer^r^Sri}i^ 
^^ss; ISd system^bmpriiing:'^^.:^^;^ 
"^TSSarframe 'cache^ said^lb&r fr^^ cache 
^nSiidSg a da£m field; - - ^ - -4 ^ - 
' a state' bit cache, said state bit cache including 
a state" indicator con-esponding "to said ciatum 

means for executing^ by said processing ele- 
35 ^^enr^thread witKin^a process?^^^ ^:v.;<.i:i:^ - 
;^eanS for ^fetching from said local frame cache 
' said datum field and from said state bit cache 

said '^tate^mdicator having a first state value;-^^ '"'^^ 

means for determining based on said first state 
40 value whether said datum field includes a^da- i 

' tum available for use by said thread; arid ' .- -^^ 

means for- d^^^ execution ^of said thread ^ 

when said datum is unavailable. 

45 12. The sys'tem^of claim 11. further comprising 
means for determining a second state for said 
state indicator, said second state replacing 
said first state during said thread execution. 

50 13. The system of claim 12. wherein said second 
state determining means comprises: 
means for selecting from said thread a state 
function to be used in determining said second 
state; 

55 a plurality of tables each having N possible 

second states for said indicator; 
means for using said state function to select 
from one of said plurality of tables said N 
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possible second states; and u ^ 
means for using said first state to choose from 
said selected N possible states said second 
state for said indicator. 

14. The system of claim 13. further comprising a 
main storage, said main storage comprising a 
copy of said datum field and a copy of said 
state indicatoVrsaid copy of 'said "datum "field 
being copied from main storage to said local 
frame cache and said copy of said state -wi-, 
dicator being copied from said main storage to 
said state bit cache. 



stored and said source operand can receive a 
plurality of compressed continuation descrip- 
tors. 

19. The method of claim 18. wherein said awaken- 
ing action includes the step of removing a 
compressed continuation descriptor from a 
source operand. 
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15. A method for synchronizing execution by a 
processing element 6i threads within a pro- 
cess, each of said threads including a plurality 
of instructions, said method comprising the 
steps of : ^ ; . 

executing an instruction of said thread; 
fetching during *saitfH*nstruction execution from 
a local framercachei at^ least one source 
operand afnd^^from a^state bit cache at least 
one state indicator having a first state value, 
s said at- least one state indicatorJcorresponding 
^ 'to said at least one source operand; 

fetching from said instruction at least one state 
function associated with said at least one 
fetched source operand: 

using said at least one state function to select 
from one of a plurality of tables N possible 
second states for said at least , one state, indica- ^ 
tor. each of said second states having a cor- 
responding flag indicator; ! 
using said first state to choose from said;se- 
lected N possible states a second state^ for 
said state indicator; j 
replacing said first state with said second state 
during said thread.execution; and ^ 
having said thread perform one ot aj)lurality pf ; 
actions after said instruction exeoutipn, "said - 
action being specified by said flag indicator 
associated with said chosen. second state. 
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16. The method of claim 15. wherein said plurality 
of actions includes the following actions: con- 
tinuing execution of said thread, deferring ex- 
ecution of said thread and awakening a de- 
ferred thread. 

17. The method of claim 16. wherein said thread is 
represented by a continuation descriptor and 
said deferring execution action includes the 
step of storing said continuation descriptor 
within a source operand. 

18. The method of claim 17. wherein said continu- 
ation descriptor is compressed before being 
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The method of claim 19. wherein said awaken- 
ing action further includes the step of storing a^ 
source operand i when said compressed con- 
tinuation descriptor is removed. 

The method of claim 20, wherein said removed 
continuation descriptor is stored on a queue 
local to said processing elernent. 

A system for' synchronizing execution by a 
processing element of threads within a pro- 
cess, each^^of said tthreadSjindudipg. a plurality 
of instructions. -said system .cornprisingX;^ ,^, 
means for executing an instruction of said ■■ 

thread;" . 
a^^ocaVjrame cacher said[ local frame cache . 
including a plurality of source operands; 
a state bit cache! said state bit cache including 
a plurality of state indicators, each of said state 
indicators having a first state value, and 
wherein one of said state indicators corre- 
sponds to one of said source operands: 
means for fetching during said instruction ex- 
ecution from'^said'' local frame cache at least 
one source operand and from said state bit 
cache at least one corresponding state indica- 
tor; 

means vfor fetching from said instruction at 
least one state function associated with said at 
Meast one source operand: 
J a -Plurality of tables each having N possible 
I second >tate^ for each of said state indicators; 
means for using said at least one state function 
to. select from one of said plurality of tables 
said N possible second states, each of said 
second states having an associated flag indica- 
tor: 

means for using said first state to choose from 
said selected N possible states a second state 
for said state indicator; 

means for replacing said first state with said 
second state: and 

means for having said thread perform one of a 
plurality of actions, said action being specified 
by said flag indicator associated with said cho- 
sen second state. 
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@ A method and system is described for synchro- 
nizing execution by a processing element of threads 
within a process. Before execution of a thread com- 
mences, a determination is made as to whether ail of 
the required resources for execution of the thread 
are available in a cache local to the processing 
element. If the resources are not available, then the 
resources are fetched from main storage and stored 
in one or more local caches before execution begins. 
If the resources are available, then execution of the 
thread may begin. During execution of the thread 
and, in particular, an instruction within the thread, the 
instruction may require data in order to successfully 
complete its execution. When this occurs, a deter- 
mination is made as to whether the necessary data 
is available. If the data is available, the result of the 
instruction execution is stored and execution of the 
thread continues. However, if the data is unavailable, 
then the thread is deferred until the data becomes 
available and a new thread is processed- When 
deferring a thread, the thread is placed in the mem- 
ory location which is to receive the required data. 
Once the data is available, the thread is removed 
from the data location and placed on a queue for 



execution and the data is stored in the location. 
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